-
Notifications
You must be signed in to change notification settings - Fork 7.1k
feat(ci): wanda ray image builds, uploads to Dockerhub #59936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: andrew/revup/master/push-script
Are you sure you want to change the base?
feat(ci): wanda ray image builds, uploads to Dockerhub #59936
Conversation
|
02a8921 to
fe68786
Compare
f104bf5 to
288cadb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully migrates the Ray Docker image builds to use wanda, which simplifies the CI configuration. The new wanda.yaml files and the shared Dockerfile are well-structured. However, I've identified a few areas for improvement, primarily in the new push_ray_image.py script. There's a critical issue where the script only supports pushing the main ray images, which is a regression from the previous system that also handled ray-extra and ray-llm variants. Additionally, the script duplicates significant portions of the image tagging logic from another file, which will create maintenance challenges. I've also included a suggestion to reduce duplication in the .buildkite/build.rayci.yml file using YAML anchors.
|
|
||
| # GPU_PLATFORM is the default GPU platform that gets aliased as "gpu" | ||
| # This must match the definition in ci/ray_ci/docker_container.py | ||
| GPU_PLATFORM = "cu12.1.1-cudnn8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU_PLATFORM constant is hardcoded here and is also defined in ci/ray_ci/docker_container.py. Duplicating constants can lead to inconsistencies if one is updated and the other is not.
This constant should be defined in a single, shared location and imported where needed. This could be done as part of the larger refactoring of the tag generation logic I mentioned in another comment.
fe68786 to
46cc862
Compare
288cadb to
164e2e4
Compare
46cc862 to
70df756
Compare
164e2e4 to
b4fb3e0
Compare
70df756 to
ae32d45
Compare
b4fb3e0 to
cd79253
Compare
ae32d45 to
04cfd26
Compare
cd79253 to
7d257e4
Compare
04cfd26 to
55d1473
Compare
7d257e4 to
e93a821
Compare
55d1473 to
84d7479
Compare
84d7479 to
c275fe2
Compare
8a4bdf5 to
a0b03e9
Compare
c275fe2 to
7c78fa2
Compare
a0b03e9 to
32347fa
Compare
7c78fa2 to
29cf5ab
Compare
29cf5ab to
d566c8a
Compare
8d1057a to
d8901f7
Compare
d566c8a to
9190029
Compare
021b2e1 to
52b7b21
Compare
64ac2fd to
d5b11ff
Compare
8cfa977 to
924d46f
Compare
275827a to
04d6806
Compare
04d6806 to
6e985e9
Compare
6e985e9 to
17d793f
Compare
| @@ -0,0 +1,13 @@ | |||
| name: "ray-extra-py$PYTHON_VERSION-cpu$ARCH_SUFFIX" | |||
| disable_caching: true | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to disable_caching because the ray-wheel image has its own caching disabled. I need to double check this assumption.
17d793f to
716cc65
Compare
|
Results from #60186: |
| env: | ||
| PYTHON_VERSION: "{{matrix}}" | ||
| ARCH_SUFFIX: "" | ||
| RAY_VERSION: "3.0.0.dev0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be deprecated in favor of an envfile--
leaning towards having an env_file support in wanda spec/config that can read in build args to set, so that we can do labeling in the dockerfile that are based on file contents
716cc65 to
976be9f
Compare
Adds Dockerfile and Wanda files for Ray images, pulling artifacts from previous steps. Buildkite steps also updated to use these, including both build and upload steps. Topic: ray-image Relative: push-script Signed-off-by: andrew <[email protected]>
976be9f to
f4e1306
Compare
f4e1306 to
73da838
Compare
a0e2570 to
06d411f
Compare
| echo "Error: Expected 1 ray wheel file, but found ${#WHEEL_FILES[@]} in /home/ray/." >&2 | ||
| ls -l /home/ray/*.whl >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glob validation fails to detect zero wheel files
Low Severity
The wheel file validation logic uses a bash glob pattern without setting nullglob, so if no files match /home/ray/ray-*.whl, the array contains the unexpanded literal pattern string as one element. The check ${#WHEEL_FILES[@]} -ne 1 then incorrectly passes, and the error message would claim "found 1" wheel file when actually there are zero. The pip install would still fail eventually, but with a confusing "file not found" error rather than the intended descriptive message.
Adds Dockerfile and Wanda files for Ray images, pulling artifacts from previous steps. Buildkite steps also updated to use these, including both build and upload steps.
Topic: ray-image
Relative: push-script
Signed-off-by: andrew [email protected]